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The Relationship Between the Bock- Aitkin Procedure and 
the EM Algorithm for IRT Model Estimation 



Yaowen Hsu 
Terry A. Ackerman 
Meichu Fan 



Abstract 



It has been previously shown that the Bock-Aitkin procedure is an instance of the EM algo- 
rithm when trying to find the marginal maximum likelihood estimate for a discrete latent ability 
variable (latent trait). In this paper, it is shown that the Bock-Aitkin procedure is a numerical 
implementation of the EM algorithm for a continuous latent ability variable using numerical 
quadrature. Further, the relationship between the EM algorithm, marginal maximum likelihood 
estimation, and the Bock-Aitkin procedure is described for both the discrete and continuous 
cases. Some issues concerning the use of the Bock-Aitkin procedure and the EM algorithm are 
addressed. 
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The Relationship Between the Bock-Aitkin Procedure and 
the EM Algorithm for IRT Model Estimation 

Bock and Aitkin (1981) described a procedure to solve marginal maximum likelihood 
(MML) estimation. It can be shown the Bock-Aitkin procedure is an EM algorithm, which was 
formulated by Dempster, Laird, and Rubin (1977). Along with the introduction of the software 
BILOG (Mislevy & Bock, 1993), the Bock-Aitkin procedure and the EM algorithm have become 
popular item parameter estimation approaches. For example, researchers have developed meth- 
ods for using the Bock-Aitkin procedure under a variety of item response theory (IRT) models 
(Bock, Gibbons, & Muraki,1988; Mislevy & Bock, 1985; Muraki, 1992; Muraki & Carlson, 
1995; Thissen, 1982). Other researchers have also developed procedures for using the EM algo- 
rithm for parameter estimation in IRT (Rigdon & Tsutakawa, 1983; Tsutakawa, 1984, 1985). 

However, in their paper Bock and Aitkin (1981) did not show how their procedure is an 
instance of the EM algorithm, although they mentioned that the procedure “is closely related to 
the EM algorithm” (p.444) but “not the same as the general EM algorithm” (p.448; also quoted 
in Lewis, 1985, p.206). Lewis (1985) gave insightful comments and sketched a proof of the 
relationship between the EM algorithm and the Bock-Aitkin procedure with a general discrete 
distribution of ^-ability (conventionally, 9 denotes the ability or latent trait. In this paper, the dis- 
cussion includes multiple abilities. The 9 distribution is referred to as unidimensional or multidi- 
mensional by context). Harwell, Baker, and Zwarts (1988) provided a mathematical background 
for MML estimation and the Bock-Aitkin procedure, and demonstrated the procedure under the 
assumption of a discrete 9 distribution. Rigdon and Tsutakawa (1983) and Tsutakawa (1984, 
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1985) showed how to use the EM algorithm to hnd MML estimates of the IRT model parameters 
(The term EM/MMLE will be referred to as the EM algorithm for MML estimation). Woodruff 
and Hanson (1996) described the EM algorithm for maximum likelihood (ML) parameter esti- 
mation for finite mixtures (i.e., when 9 is discrete). They then related the Bock- Aitkin procedure 
to theirs. 

In spite of the various discussions relating to the Bock-Aitkin procedure and the EM 
algorithm, it still seems that the underlying mechanics of these two algorithms and their rela- 
tionship are not well-understood. This understanding can clarify some issues. For example, is 
the underlying 9 scale discrete or continuous when the Bock-Aitkin procedure is used? Bock 
and Aitkin (1981) heuristically described two approaches regarding the use of 9 to implement 
their procedure. One approach uses Gauss-Hermite quadrature to compute the integral needed 
when 9 has the normal distribution (here called BAl). Another uses a discrete distribution (here 
referred as discrete representation) to approximate a specified 9 (continuous) distribution with a 
finite number of prespecified and equally spaced points (here called BA2). No integral is eval- 
uated. The BA2 procedure is extended to the case where the 9 distribution is unspecified and 
is estimated based on a discrete representation (Mislevy & Bock, 1985, here called BAM). In 
some literature (e.g., Mislevy & Stocking, 1989, p.61), the discrete points and probabilities of a 
discrete representation are labeled as the quadrature points and weights, although the approaches 
differ-for example, the quadrature points in Gauss-Hermite quadrature are not equally spaced. 
To date, the Bock-Aitkin procedure has only been shown to be an instance of the EM algorithm 
by using a discrete representation, or in fact, by assuming that 9 is discrete with a finite number 
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of predetermined values (Harwell et al., 1988; Lewis, 1985; Woodruff & Hanson, 1996). For 
example, Mislevy and Stocking (1989) stated that the steps of the Bock-Aitkin procedure are 
“exactly the steps of the EM algorithm (Dempster, Laird, & Rubin, 1977), in the special case 
of missing multinomial indicators’ because ^s are limited to a finite number of values” (p.6l). 
Then, because the EM algorithm can give MML estimates (Tsutakawa, 1985; also will be de- 
scribed later), we know that the estimates obtained from the Bock-Aitkin procedure aie MML 
estimates (although it is not guaranteed in general). However, when the resulting item parame- 
ter estimates are applied to the estimation of 9 . the 9 scale is usually assumed to be continuous 
rather than discrete. Moreover, this can not justify how the procedure in which Bock (1995) used 
constant 243-point fractional quadrature for any number of multiple abilities can give MML esti- 
mates (the method of adaptive quadrature is also used by Bock & Schilling, 1997, cited in Bock, 
1997), unless like using Gauss-Hermit quadrature the nodes and weights of fractional quadrature 
are treated as a discrete representation, but not elements for numerical integration (which means, 
however, a three-point discrete distribution-or a three-point histogram-may be used to approxi- 
mate a normal distribution). Strictly speaking, we may say that the BAl procedure in which the 
standard normal distribution of ability is assumed has not yet been shown as an implementation 
of MML estimation. As will be shown below, from a theoretical view of point the Bock-Aitkin 
procedure is a reformulation (or an instance) of EM/MMLE with a continuous or discrete 9 dis- 
tribution (note that the EM algorithm can be applied for a more general 9 distribution). BAl, 
BA2, and BAM are the implementation versions of EM/MMLE for a continuous 9 distribution. 
The use of fractional quadrature in the Bock-Aitkin procedure can be justified. If 9 is discrete. 
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then BAM is the same as EM/MMLE. 

Moreover, from an EM algorithm perspective, the Bock-Aitkin procedure can naturally 
include estimation of the ^-distribution parameters as well as item parameters, as BAM does. 
Note that this simultaneous estimation is not joint ML (JML) estimation in which, for example, 
individual examinee 9 values (instead of ^-distribution parameters) are estimated. One feature of 
the Bock-Aitkin procedure is that it reduces some computation load of the EM algorithm. How- 
ever, the Bock-Aitkin procedure, being an EM algorithm and hence inheriting the algorithm’s 
statistical properties, may not converge to the MML estimate. 

Therefore, different from earlier work restricted to the case of discrete 9 , in this paper 
the relationship between both algorithms is described with a general 9 distribution, although a 
continuous scale is emphasized. A parametric distribution of 9 is assumed; that is, the (underly- 
ing) 9 distribution is specified in which the distribution parameters are either known or unknown. 
BAM assumes a nonparametric/unspecified 9 distribution, then the procedure is for semiparamet- 
ric IRT models where the item response function aRF) is specified but the 9 distribution not (see 
Holland 1990; if both are not specified, then it is a nonparametric IRT model), and is beyond the 
scope of this paper. On the other hand, both BA2 and BAM use a discrete representation, so the 9 
distribution is considered being discrete, at least, for MML estimation. It is because in the course 
of MML estimation, a discrete distribution is actually used, although it is an approximation of a 
continuous distribution (B. Hanson, personal communication, December 8, 1997). 

First, the MML estimation and the Bock-Aitkin procedure are described. Next, the EM 
algorithm is introduced and its application to IRT settings is given. Then, the relationship be- 
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tween the EM algorithm and the Bock-Aitkin procedure is described both when 9 is discrete 
and continuous. Finally, some issues regarding the use of the EM algorithm and Bock-Aitkin 
procedure are discussed. 



Consider dichotomously scored items (binary items; i.e., an answer to the item is classi- 
fied into one of two score categories; the category of interest is labeled “correct”, and “incorrect” 
otherwise). The arguments developed in this paper can be applied to polytomous items when 
each score category is treated as a “binary item” (see Muraki, 1992; Muraki & Carlson, 1995). 
Using conventional terminology, an item response refers to an item score. Let the random vari- 
able U denote the item score which takes on 1 , if the answer is correct, and 0 otherwise. A test T 
of k given and fixed binary items administered to a population C of examinees is characterized by 
the random vector U_ which is the response pattern (or response vector), U_ = (Ui, U 2 , • • • , Uk)', 
where the prime (') represents the matrix transpose. Note that as defined by Holland (1990), a 
test is “a specific set of questions with specific directions, given under standardized conditions of 
timing, item presentation, and so forth. If any of these elements change, the resulting test is dif- 
ferent from T” (pp.577-578). The examinee is characterized (or assigned) by the m-component 
random vector 0 = (0i, © 2 , • • • , ©m)' . where each 0/i denotes an ability (or latent trait) mea- 
sured by the test T, and m is the number of dimensions of the complete latent space (of 0), 
I <m<k. The complete latent space refers to the minimum dimensionality necessary for local 
independence (for the test T) to hold-that is, C/i, C/ 2 , • • • , and Uk are stochastically conditionally 
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independent, given that © Has the value The assumption of local independence is sometimes 
referred to as the principle of local independence in the literature. Theoretically the 6 scale can 
be any type, although a continuous one is of main interest here. 

An IRF is used to define an IRT model (latent trait model). The IRF for an item with 
respect to T and 0 (i.e., an item’s IRF may vary for different test or examinee population) is 
specified as the conditional probability of a correct response to the item, given 0: 

Pv{U = l\Q = 9,f) = Pr^{U = 1|0 = 9), (1) 

on the m-dimensional complete latent space, where (j) is the vector of item parameters. 

Given 9, the probability function of U can be expressed as 

Pr(C7 = m|0 = 9,^) = Pr{U = 1|0 = 9,f)^{l - Pr{U = 11© = (2) 

for u = 0 and 1; zero elsewhere. By local independence, the conditional probability of C/ of k 
items, given 9, is 

Pr(U = n|0 = 0) = n Pr(Ui = mlQ = (3) 

i'=l 

where 0 = (^^,^ 2 ’ ' ' ' parameter matrix for the test T, where the components 

of the vector for item i depend on the particular model. 

Note that the IRF is called the item characteristic curve (ICC) or trace line, when m = 
1. For a polytomous item, the IRF is the probability of a score category, given 0 = 0. Note 
that as Cressie and Holland (1983) pointed out, because the abilities are not directly observable, 
an alternative way that an IRT model can be characterized is by using manifest probabilities. 
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the proportions of examinees in a given population who obtain particular response patterns, 
Pr{U = u) (also see Holland, 1981, 1990). This characterization also involves the specifica- 
tion of the 6 distribution in addition to the IRF and explicitly expresses the assumption of local 
independence. Adams, Wilson, and Wu (1997) regarded this characterization as a stmctural 
measurement model,, called a two-level nonlinear hierarchical item response model in which the 
specification of an IRF is still referred to as a conditional item response model and the specifica- 
tion of the 6 distribution referred to as a population model. The manifest probabilities constitute 
the likelihood for MML estimation (see Equation 8). The first word “marginal” in the term MML 
can be dropped when this characterization is used (see Holland, 1990). 

Consider a random sample of n examinees from the population C. Let 



be the response data (item score) matrix, consisting of response patterns Uj = 
{Uji, Uj 2 , • • • , UjkY . where Uji denotes the item score of examinee j on item i. The n x m 
ability matrix of n randomly selected examinees is 



© = 





/ 


• 






V 



©n ••• 0 



\ 



Im 



©nl ■ ■ ■ ©t; 



o 
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For simplicity, let 

p,(0,.) = Pr(f/,, = l|0,. = 0,,0.). (4) 

The conditional probability that annx k response data matrix u on a test will be produced by a 
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group of n examinees with 6_j who respond independently is 



n 



Pr(t/ = u|© = 9,.#,) = nP'(fij=%le,=2,,0) 



j=l 



n k 






1 Uji 



(5) 



If the values of • ' ' > and 6^ of the n particular examinees are of interest, that is, this 
group is the target population C, then 6 in the conditional probability of Equation 5 is regarded 
as a nonstochastic matrix of incidental (or person) parameters that can be estimated along with 
item parameters by observing the data matrix u, since 9 is unknown but fixed. 

Both item and person parameters can be estimated using ML estimation. The likelihood 
function is the conditionally joint probability of Equation 5, that is. 



A set of ML estimates (MLEs) for (f) and 0 is obtained by simultaneously solving the 
likelihood equations which are 



where 6ji is an element of 9 (there are n x m equations). Because the system of likelihood equa- 
tions is nonlinear, an iterative procedure is needed, for example, the Newton-Raphson method. 



L(0, (j)\u) = Pr(C/ = u\e = 9, <j)). 



91n L(0, (f)\u) 
d(j)z 



where (j)^ is an element of (f ) , and 



9 In L{9, (f)\u) 
dOji 
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However, there are some problems for this simultaneous estimation. For instance, it is imprac- 
tical to compute the inverse of the Hessian matrix of second derivatives needed in the Newton- 
Raphson method, when n (the number of examinees) is large. 

Bock and Lieberman (1970) proposed a procedure, called MML estimation (Bock & 
Aitkin, 1981), to estimate item parameters for the normal ogive model. Using the estimated item 
parameters, the ability level of each examinee can then be estimated/predicted by, for example, 
the conditional probability of Equation 3 or the conditional distribution of 0 given an observed 
response pattern u (see Holland, 1990). In the terminology used in linear models, MML esti- 
mation gives mixed-effect solutions by treating (j) fixed but 0 random (Bock & Aitkin, 1981; 
Rigdon & Tsutakawa, 1983). 

In Bock and Lieberman’s paper the 9 distribution is fixed to the standard normal distri- 
bution iV(0,l), hence no distribution parameter needs to be estimated. Bock (1997) stated that 
“Although this integral [i.e.. Equation 6 below] includes the density function of the latent pro- 
ficiency[/ability] distribution, the location and scale of that distribution may be absorbed in the 
item parameters and arbitrarily set to conventional values such as 0 and 1. Thus, if the latent 
distribution is indexed only by location and scale parameters, as is the normal distribution, it is 
fully specified and does not increase the number of parameters to be estimated” (p.27). However, 
Bock (1997) mentioned that one of the requirements of practical testing programs is to estimate 
the 9 distribution jointly with the item parameters (p.27); or, “one of the strengths of IRT, com- 
pared to classical test theory, is that it provides estimation of the proficiency latent distribution 
that is largely invariant with respect to test length and item characteristics” (p.29). In fact, in 
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addition to item parameters, MML estimation as presented here naturally includes estimation of 
the ^-distribution parameters for those IRT models such as the one-parameter Rasch model and 
a model with a discrete 9 scale. Bock (1997) pointed out the condition for this simultaneous 
estimation of both item parameters and ^-distribution parameters. 

For a random sample of n examinees from a population C, let 0|, 02 , ■ • • , and 0„, de- 
note examinees’ abilities that are independent and identically distributed with the density func- 
tion, g{9\§), belonging to a family indexed by the vector of the distribution parameters j3. The 
distribution is usually assumed to be fixed as normal iV(0,l) for the unidimensional case, then ,5 
is known. One possible formulation of the (marginal) probability function of the response pattern 
u is 



As mentioned above, Pr(C/ = is the manifest probability that Cressie and Hol- 

land (1983) and Holland (1981, 1990) used to characterize an IRT model. Moreover, the 
MML estimation procedure described here can be extended for other possible formulations 
of Pr{U = 2i) which may have different or additional types of model parameters, as long as 
these parameters can be attributed as item parameters or ^-distribution parameters; for example, 
Adams, Wilson, and Wu (1997) introduced examinee demographic variables into the 6 distribu- 
tion. 

For k items, there are 2^ possible response patterns. The n randomly selected examinees 



Pi- 0 ^^(£=m) = 'Pr{U = u\(p, §) = J f{u,9\<i),§)d9 




( 6 ) 
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will give n (not necessarily distinct) observed response patterns. Let denote the distinct 
response patterns that appear in a test, w = I, 2, ... , s and s < min{n, 2*}. Let denote the 
number of examinees whose response pattern is such that 

Note that if 0 is discrete with q possible values Xi, I = 1,. . .,q, then the ^-distribution 
parameters are, g probabilities associated with the values of Xi, where 

= g{^\/3i) = Pr(0 = ^|A)> 

and A = 1 and Equation 6 becomes 
Pr{U = u\(f>,§) = 

i=i 

= 't^r{U = u\e=^,cl>)Pr{e = ^\l3i) 

i=i 

= = = (7) 

i=i 

where ^ = (A, A, - • • , A)'- 

For (marginal) ML estimation of parameters (f) and the (marginal) likelihood function 
is the probability of observing a response data matrix u which is, for (ri,r 2 , . . . ,r^} being 
multinomially distributed, 

L = Pr(l7 , n [Pr(77 = (8) 

11tu= 1 tu=l 

The log-likelihood is, up to a constant factor independent of (f) and P, 

\nL= Y^r^\nPr{U = u^\(l),§). 

W = l 




( 9 ) 
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The MML estimates (MMLEs) are obtained by solving the likelihood equations, 



and 



d\n L 



d\nL 



0 , 



( 10 ) 



(11) 



where denotes an element of the item parameter matrix </>, and an element of the 0- 
distribution parameter vector p. 



The first derivatives of In L with respect to an item parameter and a ^-distribution param- 
eter are, respectively. 



5 In L 
d^it 



and 



E 



ap>-(i£ = 2.„|0,« 



Pr ([/ = I </>, p) d(j)it 

rw = M^l© = e, (l>)g{e\p) 



w=l 



Pr(f/ = [Pii0){l-Pi{i))\ d(t>it 



Uwi-Pi{0) 



dpi{9) 



dO, ( 12 ) 



d\nL _ * Jjw dPr{U = Uuj\(l>,P) 

dPw ~ i^iPr{U = uJ(f),p) dP^ 

_ f = 9, 4>)g{9\pp d\i\g{Q\^ 

~ hJ Pr{U = uJ<f>,P) dp, - 

where Pi(0) - Pr{Ui = 1|© = 9,^^, and in Equation 12, 
dPr{U = uJ(f),p) 

d^it 

_ d f Pr(U = I© = 9, (f})g(9\P) d9 
d(t>it 
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= Jaim 



dPr{E = U^\& = i,<t>) 
d(j>it 



dJd 






d\nPr{U_ = «tul^ = 



dd>; 



dO, 



(14) 



'it 



also. 



d In Pr{U_ = _ ^^h=i [^wh^^PhjQ-) + (1 — Uwh) lr>(l ~ Ph{8.))] 



d(l>it 



dd>it 

_ u^i dpi{0) 1-Uu,i d{l-pi{9)) 

~ Pi{0) d(j)it 1 - Pi(0) 84) it 

_ u^i-pM) 8pi{0) 

Pi{0){l - Pi{0)) 84>it 



(15) 



and in Equation 13, 



8 Pr(C7 = ujcf), §) 8JPr{U = u„|0 = 0, <P)g{0\§)d^ 

8p^ 8p^ 



8g{0\P) 

8P. 



d0 



= /Pr(Ii = 2S„|a = 2,<#') 

= / Prtf/ = »„|a = 2, di- (16) 

Tsutakawa (1984, 1985) compared Equations 12 (MML estimation) with 40 (EM algo- 
rithm) using the two-parameter logistic model. 

For the (unidimensional) two-parameter normal ogive model and the standard normal 
distribution of 0, Bock and Lieberman (1970) suggested using the Newton-Raphson method 
and Gauss- Hermite quadrature to solve the likelihood equations given in Equation 10. That is. 
Equation 12 (in which 0 is replaced by 0 and the known P_ is dropped), is approximated as 



^In L ^ Pr{U — ^u,|0 = xi, <f))A{xi) 

94>it ^ hi P{LL = uj<t)) 



'^wi PiiS^P 



,Pt(a:/)(l -Pi(a^/)) 



8pi{xi) 

84>it 



( 17 ) 
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where {xi: I = 1, . . . , q} art tabled nodes on the 6 scale and {^(o;;)} are associated weights in 
the Gauss-Hermite quadrature with appropriate transformation (see Bock & Lieberman, 1970), 
and 

P{u = Uw\<l>) = =uje = Xi, 4>)A{xi). 

l=l 

They obtained stable estimates of item parameters for five items. However, “computational dif- 
ficulties limit the approach to not more than 10 or 12 items (Bock & Lieberman, 1970, p,180).” 
The computational demand arises through the need, in each iteration, for the inverse of a x 2k 
infomiation matrix. Moreover, each element of the information matrix is the sum of 2^ terms, 
regardless of the value of s. And, each of 2^ terms involves evaluation of the integration. 

Because of these problems, Bock and Aitkin (1981) reformulated Equation 17 and pre- 
sented a procedure (called the Bock-Aitkin procedure in this paper) of the EM-algorithm type. 
Harwell et al. (1988) stated that this reformulation “produces consistent item parameter esti- 
mates” (p.254). (Note that Mislevy and Stocking (1989, p.59) stated that the MML approach 
“yields consistent estimates of item parameters”, conditional on the veracity of the IRT model.) 
Moreover, Bock and Aitkin (1981, p.444) mentioned that by this reformulation, “a computation- 
ally feasible solution is possible for both small and large numbers of items,” and freed “from 
arbitrary assumptions about the distribution of ability in the population effectively sampled.” 

Bock-Aitkin Procedure 

In the Bock-Aitkin procedure (Bock & Aitkin, 1981), the 6 distribution is fixed at each 
estimation iteration. (Mislevy and Bock (1985) extended the procedure so that the distribution 



O 

ERIC 



20 



15 



may vary for each iteration.) Assume that 0 has the standard multivariate normal density g{9), 
then the distribution parameters are known, and Equation 7 becomes 

InL = ^ r^lnPr(t/ = (18) 

W=1 

where Pr(t/ = m ^|</>) = /Pr(L[ = M^|0 = 9, ff>)g{S.) d9. The approximation in Equation 17 
of the first derivative of In L with respect to an item parameter is reformulated, in terms of the 
m X 1 vector 0, as 

9 In Z/ ^ (19) 

where m-dimensional Gauss-Hermite quadrature with appropriate transformation (see Bock 
& Aitkin, 1981; Bock & Lieberman, 1970) is employed to evaluate the integral (note that 
by using the standard multivariate normal distribution, the abilities are assumed independent), 
= {xii, xi 2 , . . . , xim)' :/ = 1, are quadrature nodes on the m-dimensional space, and 
(A(^) = A{xn)A{xi 2 ) ■ ■ ■ A{xim)} are the associated weights, and 



« Pr{U = uJQ = Xi, 4>)A{xi) 

P(R = !iM) 


(20) 


’’““.S P(C = !<.I0) 


(21) 


P{lL = Un>\(t>) = = iLw\0. = ■ 


(22) 



i=i 

A Bock- Aitkin procedure (the BAl procedure) can then be briefly described as follows; 

1. Using provisional values of item parameters and pre-specified nodes, compute 
P(U_ = u^|</>) for response pattern w, w = 1,2, . . . , s. 
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2. (E-step) Compute n; and ru for each item i and each node 1. 

3. (M-step) Obtain the item parameter estimates by solving the system of likelihood equations 
for all item parameters via Equation 19, where n< and Tn are treated as constants. 

4. Go back to step 1 unless the convergence criterion is reached. 

A comparison between the second line of Equation 13 and the second line of Equation 
12 leads to an extension of the Bock- Aitkin procedure where the ^-distribution parameters can 
be estimated along with item parameter estimation. Note that Bock and Aitkin (1981) gave 
heuristic meanings of n; and ru, by treating the quadrature points and weights A{xi)} as 
a discrete distribution approximation to the standard normal distribution (because T,A{x[) = 1, 
A(^) can be interpreted as the probability at point x^). However, the meanings will be obvious 
from an EM algorithm perspective discussed later. Bock and Aitkin also proposed to use a 
discrete distribution (or discrete representation) on a finite number of equally spaced points to 
approximate a continuous distribution of ability which has finite mean and variance (also see 
Mislevy & Bock, 1985, 1993). Bock and Aitkin (1981, pp. 449-450) described a process using 
this discrete approximation which can give a nonparametric estimate of the continuous (normal 
or any other) distribution of ability. It should be noted that this approximation is not for numerical 
quadrature to compute the integral, although the discrete points and probabilities are labeled as 
quadrature points and weights in some literature. In this paper, this discrete representation is 
treated as if the underlying 9 distribution is discrete in item parameter estimation. 
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EM Algorithm 

The name EM refers to the two steps of the algorithm; the expectation (E) step and the 
maximization (M) step. The EM algorithm (Dempster et al„ 1977) is a general iterative process 
of ML estimation for incomplete data. The complete data contains two parts: one is incomplete 
data, and another can be referred to as missing data that can be missing values, unobservable 
data, or parameters. 

Suppose there is a model for the complete data Y, with an associated probability or 
density function f{y\^) where ip is the set of unknown parameters to be estimated. Let ^obs 
represent the observed part of Y, and ^^nis denote the missing part. The complete data is given 
as Y= (l^obs’ ^mis)- algorithm finds the MLE, <^mLE’ ^ ^^at maximizes the 

likelihood function based on the observed data Y which is 

^'(‘^lyobs) = fiVohsl^) = I ^ymis • (23) 

However, the EM algorithm uses the complete data likelihood f{y\(p) instead of / (l/obsl^^)- 

When the complete data density (or probability) function f{y\<p) has an exponential- 
family form, it is easy to implement the EM algorithm (Dempster et al., 1977). However, the 
“mixed effects” IRT models (i.e., 9 is random but 0 is fixed) generally do not possess the prop- 
erties of the exponential families (Ridgon & Tsutakawa, 1983). Then a more general form of the 
EM algorithm can be used which is as follows: First, initial values are established. Let 
be the estimate of ip at the tth iteration (or referred as an EM cycle). Then the (t-i-l)th EM cycle 
can be expressed in two steps (Dempster et al., 1977): 
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E-step: Compute where 

Q{<p\<p^^^) = E[\n f{Y\ip)\y^^^,ip(*)] = I ln/(y|v5)/(yj„j5|yobs>¥’^‘Vymis> (24) 

M-step: Find that maximizes 

In some cases maximization is hard to attain, then a generalized EM (GEM) algorithm can be 
used in which the M-step is to find such that 

g((^('+i)|(^W) > g((^W|(^W) . (25) 

The EM algorithm converges reliably in the sense that each EM cycle increases the like- 
lihood L[ip\yQ^^) and if In L(cp|yQ^g) is bounded, then under certain conditions, the sequence 
In ^(V’^^^ll/obs) converges to a stationary value (i.e., on which the first derivative is zero, but not 
necessarily a global/local maximum) of In L(cp|yQj^g) (see Dempster et al., 1977). However, it 
is well known that the EM algorithm is usually slow to converge, the rate of convergence can 
be painfully slow if the amount of missing data is large. Moreover, the convergence of the se- 
quence In L((^(*)|yQj^g) by the EM or GEM algorithm may not imply the convergence of 
(Wu, 1983). 



How the EM Algorithm Is Applied in IRT Settings 

Consider n examinees for a test of k items. The response data matrix U is the incomplete 
data (i.e., l"obs)- With the ability matrix © as Y = (U,&) as in Equation 23 represents 

the complete data. Assume that • • • , ©n} represents a random sample from a distribu- 

tion whose density is g{6\0. Under the modeling of Equation 6, and by local independence, the 
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joint distribution of y= (U, © ) with parameters (p = (cf) , /3) can be expressed as, 

/(y|*>) = = Pr{U = u\@ = e, tl>)g{e\li) 

i=l j=l j=\ 

Using the EM algorithm (also see Rigdon & Tsutakawa, 1983; Tsutakawa, 1985), the ()J+l)th 
iteration chooses that maximizes the conditional expectation over 0, 

0(¥>l¥>*‘’) 

= Blln/(y|¥>)|y„|,s.vpWl 

= t. ®1'" bi. ^“’1 + E E Elln Pr(t/,i = K,i|e^,4)bi. .#.<‘>.£<‘>1 

j=l i-l j-l 

= E Eiing(ei«i«^,,#.<‘),£(‘)] + ^1'" = %iia.4)l %. av 

j=i t=i j=i 

The Equation 26 is used in going from the third to fourth line of Equation 27. In going from line 
4 to line 5, the subscript j of 0y can be dropped, because all 0^ are identically distributed. Then, 
starting with the sequence of <p^^K ■ ■ ■ is found. If this sequence converges to cp*, 

then (p* is the maximum likelihood estimate ^mLE ‘P' under regularity conditions. Note that 
theoretically the 0 distribution in Equation 27 can be more general, not restricted to the discrete 
or continuous type. 

The maximization of Equation 27 can be done by maximizing each of its terms, and each 
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term only involves one kind of parameter, hence both and can be found separately. 

Moreover, the maximization of the second term with respect to <p may be performed for each 
(p. in the inner summation over j through the usual optimization procedure, for example, the 
Newton-Rap hson method. That is, in is chosen to maximize 

E[\n Pv{Ui = uji\Q,^.)\uj, (28) 

j=i 

with respect to for each item i. 

Under regularity conditions, the convergent estimate (p* of the EM algorithm also maxi- 
mizes 

Y[Pr{Uj ^ Uj\(p,§), (29) 

j=i 

that is proportional to the marginal likelihood function (see Equation 8), so ip* is the MMLE 
of p. As mentioned in the previous section, it should be noted that convergence of the EM 
algorithm is not guaranteed. Even if the EM algorithm converges to the EM estimate p*, the 
EM estimate still may not be the MMLE (see Tsutakawa, 1985, and Lewis, 1985, for further 
discussion of the convergence of the EM algorithm and the MMLE). 



The Relationship Between the Bock-Aitkin Procedure and the EM Algorithm 



Discrete 6 



The demonstration is from a probit analysis or bioassay solution perspective (Bock & 
Aitkin, 1981; Harwell et al., 1988). For a discrete 6_, to show the Bock-Aitkin procedure is 
an instance of the EM algorithm (R. K. Tsutakawa, personal communication, September 23, 
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1993), it is assumed that all possible ability levels are finite, {0^ = Xi = {xn,xi 2 , - ‘ 

I = 1,2, ... ,q}, with a known distribution of probabilities /?i = ^(xi), 02 = ^(^ 2 )* ‘ ‘ 0^ 
= A{^) with E A = 1- (By observing Equation 27, that is, from an EM algorithm perspective, 
the assumption of known probabilities can be relaxed as discussed later, also see Bock & Aitkin, 
1981; Mislevy & Bock, 1985; and Woodruff & Hanson, 1996). Examinees are assumed to be 
grouped into homogeneous groups, each with a distinct ability level X(, so that the complete data 
likelihood function has a form that is similar to Equation 26 where each g{xj\^ is replaced by 
A{Xj), that is, 

f{u,0\(f),0) 



er|c 



Pr{U = u|0 = e, <t>) Pr(© = e\§) 



k n 



n n pric'ji = = ivii) n Mii) 

i=lj=l j=l 



q k 



= n n =m = - pr{Ui = ii0 = n 



l=li=l 



l=\ 



= f{n,r\(f),0). 



(30) 



where n = {rii,n 2 , ■ ■ ■,nq)', rii = ni{0) = E"=i denotes the number of examinees 

with ability and E?=i^i = and l{ej=x,} = 1 if examinee j’s 9 value is 0 otherwise; 
r = {rii],ru = rn{u,e) = E"=i l{£,=x,}n{u,i=i} = E]=i UjA {e.=^} , denotes the number of 
examinees with ability X; who answer item i correctly, and E?=i f'n = Moreover, 

ln/(n,r|</>) = ^ [r,i lnpj(X() + (n, - r,i) ln(l - Pi(^))]| + ^n,ln A(x;). (31) 

U=1 /=! J /=! 
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§_ has been dropped in Equation 31, because it is known. For the complete data (i.e., assuming 
that the ability of each individual is known), {n, r} is observed. For the incomplete data, because 
each randomly selected examinee’s ability ^ is unknown, {n;} and {rjj} are unobserved. 

Using the EM algorithm, given some provisional value of we want to find 
which maximizes 



= £[ln/(w,©l</,)|w,<^(‘)] = E[\nf[N,R\<i>)\u,<t>^% 



that is, by Equation 31, maximizes 



E ££7?H[lnPr(Ui = lj0=^,0.)] + 



" k q 



{Ni - Rii) ln(l - Pr{Ui = 1\Q = x,, 0.))|w, 




(32) 



where N_ and R are random, and 



nf = E[Ni\u,<l>^^^] = E 



n 



s 






'^■w Pr(IZ! — — S-h 

Pr(t/ = 



(33) 




n 



s 



Uji Pr(0j = ^\Uj, </>^‘)) = ^wirw Pr(0 = x,|w^, </»^‘^) 
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^ Pr{U = Mt»|e = xt, 



u,, 



(34) 



f=l 

In the M-step of the {t + l)th EM cycle, the maximization of Q(</>|0^‘^) with respect to 
(f) = (<^^, • • • , <^^) reduces to maximizing the inner summation of Equation 32 for each item z, 
separately. For each item, the (complete data) likelihood equations have the form 



- nf^Pii^) dPii^) _ n 
2^ / w, / XX o, -'J) 



(36) 



;^Pi(^)(i -Pi(^i)) d(f)i 
where (j)iy is an element of The maximization of the M-step, then, gives 

For the unidimensional case, comparing Equations 33 with 20, 34 with 21, 35 with 22 
there are, without the superscript (t), 

Pr{U = uj(l)) = P{U = uj(f>), 






and 



Til ~ 



Hence, ru can be interpreted as the conditional expected frequency of correct response to item 
z at 0 = x^, and ni as the conditional expected sample size at level Xi (see also Bock & Aitkin, 
1981). Comparing Equations 19 and 36, a relationship is observed that when the ability values 
are finite with known distribution (e.g., if ability levels are assumed to be equal to those nodes in 
the Gauss-Hermite quadrature used in Equation 19, and probabilities of ability levels are assumed 
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to be equal to the weights associated with nodes), then the Bock-Aitkin procedure (or the BAl 



procedure) and the EM algorithm give the same result. 

Note that instead of Equation 30, Harwell et al. (1988) showed the relationship using 



which is proportional to /(n, r|</>,^). The constant of proportionality is positive and does not 
depend on model parameters </> and 

Continuous 9 

An EM algorithm perspective is taken. Although a continuous 6 scale is used for demon- 
stration, it is also true for the discrete case (the integration is changed to the summation). With a 
continuous 6 scale, following the EM algorithm, for each item i Equation 28 becomes 



Pr(iV = n,il = r|</>,^) 



Pr{R=r\N = n,(j),§)PT{N = n\4),§) 




pt{u, = = -Pr((7. = 





A\‘> = Y, E(lnPr(f/i = 
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= / [«,,lnPr(L^, = 1|0 = 0,0.) + (1 - ln(l - Pr([/, = 1|0 = 0,0.))] 

i=i 

•7r(0|Uj, d0 

= /in Pr([/i = 1|S = 0,0.) 51 \ujiTx{9\uj, dO 

j=i 

+ / ln(l - Pr([/i = 1|0 = 0,0.)) 51 [(1 - MJ^)7r(0|%,<0^‘^^^‘^)] dO 

■’ j=i 

= |lnPr([/i = l|0=£0.)ri(0)W 

+ ln(l - Pr{Ui = 1|0 = 0, 0.))(n(0)(‘) - ri{6f'>) dO, (37) 



where 

n(0)^‘^ = 5l7r(0|%,<0^‘\^^‘^), 

J=1 

j=i 

and the conditional density function n{6\uj, is given by 



7 r( 0 |uj, 0 >(‘\^(‘^) = 



Pr(£/^u,|0 = 0,0>(^))^(0|^(^^) 



Pr (£/^%|0 = 0 , 0 >(^^)^( 0 |^(^)) 

/ PrOZ = %|0 = 0, <0(‘))^(0|^(‘)) d0 ■ 



In going from line 2 to line 3 of Equation 37, Equation 2 is used. 
The first derivative of with respect to 0j„ is 



dAf^ 

dK 



(38) 

(39) 
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f r,(0)(‘)-n(0)(‘)Pr(t/, = l|0 = 0,0.) dPr{Ui = 1\Q = 6,^^ 

J Pr(t/i = 1|0 = -Pr(t/i = 1|0 = 0,0.)) dd^i, - ^ 

Let be the first term of the right hand side of line 5 of Equation 27, 

B(‘> = 

= /E [ln9(«®7r(S|K,, 0<‘),£W)1 de 

J=1 

= /(ln5(01^))n(0)(‘^0. 



The first derivative of with respect to /3^, an element of is 

9B(‘) _ r n{6)(*^ dg{e\§) 
d/3. J g{6\§) ap. - 



(41) 



Note that Equation 40 (from the EM algorithm) and Equation 19 (from the Bock-Aitkin proce- 
dure) are similar; they are the same, if 0 has the standard multivariate normal distribution and 



the m-dimensional Gauss-Hermite quadrature is used (also compared with Equation 12 in Bock 
and Aitkin’s 1981 paper). 



Moreover, the 0-distribution parameters can be estimated along with item parameters 
by simultaneously solving EM-likelihood equations 40 and 41. For example, a Bock-Aitkin 
procedure (BAHAF) is described as follows, with the standard multivariate normal distribution 



of 0: Using multidimensional Gauss-Hermite quadrature, 



dA) 



(t) 



d(j)i 



= / 






= 1 J pf\s.)9iS.\P}^^) de J pf{e)9ie\l^'^)d9 



" P^l^mpjiS.) 






Piim-PiiS.)) 



dpi{9) 

d(/>iy 



de 



Y -'^^^Pi{z.i) dpi{^) 
l — l Pi{x.l){^ Pi{x^)) d(j>iy 



( 42 ) 
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and 



9B(t) _ r f pj‘>(i)g(dli(‘>) de 

~ J gim 

« nP dg{^\§) 






dP. 



( 43 ) 



where p^j\0) = Pr(L[ = %|© = 0, and Pi(^) = Pr((/i = 1|0 = 0,p.). As Equations 20 
and 21, 

r„ Pr(C/ = M„|fi = azi, <^>’‘’)-4(aj) 






n, 



li — 2-^ 



w=l 



m=<s„i 0 '‘') 



r„ Pr((/ = ii„|0 = a, 0’‘’)-4(22() 






10=1 



PO/ = W.|0^‘^) 

P{u = = 5: Pr07 = «^|0 = x„ 0(‘))A(xJ . 



!=1 



(44) 

(45) 

(46) 



1 . Start with pre-specified nodes and provisional values of item parameters and distribution 
parameters. 

2. For each EM cycle, compute nf ^ and r\f of Equations 44 and 45, respectively, for each 
item i and each quadrature node in the E-step. 



3. In the M-step, use the Newton-Raphson iteration to solve EM-likelihood equations for 
distribution parameters P_ via Equation 43, and obtain the item parameter estimates by 
solving the system of EM-likelihood equations for all item parameters 0 via Equation 42. 

4. The EM algorithm stops after convergence criteria are reached. 



Note again that the maximization in the M-step can be performed for 0 and §_ separately, and the 
maximization for 0 can be performed item by item. If 6 is discrete, then the BAHAF procedure 
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is similar to those described by Mislevy and Bock (1985; i.e., the BAM procedure) or Woodruff 
and Hanson (1996), where §_ is the vector of discrete probabilities. That is, the quadrature points 
and weights are changed to the discrete points and probabilities. One application of BAHAF is 
for the one-parameter logistic (Rasch) model. With a Rasch model in which the mean of 6 is set 
to zero, BAHAF can be used to estimate the variance of 6 as well as item difficulty parameter 
(Mislevy & Bock, 1985; Rigdon & Tsutakawa, 1983). 

Therefore, theoretically the Bock-Aitkin procedure (see Equations 40 and 41) is a refor- 
mulation of the EM algorithm (see Equation 27) for continuous and discrete 9 , and can include 
estimation of the ^-distribution parameters. Lewis (1985) had noticed that the Bock-Aitkin pro- 
cedure is a special case of the EM algorithm. 

Discussion 

In this paper, it was shown that the Bock-Aitkin procedure is an instance of EM/MMLE 
when 9 is either continuous or discrete. The Bock-Aitkin procedures such as BAl, BA2, and 
BAM (see the introduction section) can be thought of as numerical implementation of the EM 
algorithm. Consequently, the procedure theoretically inherits the statistical properties of the 
EM algorithm. For example, the convergence of the Bock-Aitkin procedure is not guaranteed 
theoretically. Even if the procedure does converge, the value it converges to may not be the 
MML estimate. To date, among IRT models only the one-parameter logistic model with the 
normal 9 distribution has been shown to converge to the MML estimate under the EM algorithm, 
assuming that the algorithm converges to a finite value (Tsutakawa, 1985). 
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As Bock and Aitkin (1981) pointed out, the M-step of the Bock-Aitkin procedure per- 
forms a probit (or logit) analysis on n; and ru (see Equations 20 and 21, or 44 and 45). Because 
of this connection to the probit/logit analysis, the computation load of EM/MMLE is reduced. 
The key points are (a) interchange of integration and summation on the third line of Equation 
37 and (b) the formation of and so that the integral is not evaluated for each 

examinee for each item at each EM cycle. 

The assumption of a continuous 6 should be retained (unless 6 is intended to be dis- 
crete). The 6 scale can be discrete, because 6 is unobserved and thus the scale is arbitrary. In 
addition, the test length is finite (a test can only be constructed with a finite number of items) 
and the number of distinct observed response vectors (or patterns) is finite (for item calibration 
the sample size of examinees usually is not large), so the number of ability levels that can be 
estimated with response patterns is limited. (Note that examinees with different 6 values can 
produce the same response patterns, and different response patterns may be produced by the ex- 
aminees with the same 6 values.) However, when 6 is estimated for the individual examinees by 
their response patterns, the estimated 6 values may not match those values that were used in the 
earlier item calibration. Moreover, the Bock-Aitkin procedure was originally intended to solve 
the implementation difficulties of Bock and Lieberman’s MML estimation for a continuous 6 
scale. Therefore, the assumption of a continuous 6 variable can be adopted. If evaluation of the 
integral is desired, then the concern of how many ability values should be used is a question of 
how many (quadrature) points are needed for the precision of a numerical integral. 

On the other hand, Lewis (1985) agreed with Mislevy and Bock (1985, also see Bock, 
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1997) that discrete distributions should be used to approximate continuous ability distributions, 
because “without such a simplification, the practical applications of the [EM/]GEM algorithm 
would be limited to relatively small samples of persons” (Lewis, p.208). Besides, in the Bock- 
Aitkin procedure the probit/logit analysis is linked to the M-step; that is, the complete-data 
likelihood is the likelihood of the probit analysis: This is what the interpretations of n and r 
(Equations 20 and 21) are based on. Since the number of levels of dosage (stimuli) is finite in 
the probit analysis, so is the number of 6 values in the Bock-Aitkin (implementation) procedure. 
Although BAl is described first where Gauss-Hermite quadrature is used, BAM where a discrete 
representation is used should be the typical Bock-Aitkin procedure. In other words, the use of 
numerical quadrature is a preprocess to obtain a discrete representation. Tsutakawa (1984) men- 
tioned that the Bock-Aitkin procedure differs from EM/MMLE by using a discrete representation 
with predetermined ability levels. It should be noted that using numerical quadrature to evaluate 
the integrals (when there is no closed form to compute the integrals), and using a discrete distri- 
bution/representation with a finite number of points to approximate the continuous distribution 
are different approaches. Moreover, even the 6 values are not necessarily prespecified, although 
the number of 6 levels is prespecified. Bock (1997) mentioned that “if both the probabilities and 
the locations of the points are estimated jointly, in the. manner of Kiefer and Wolfowitz, the la- 
tent distribution is characterized semiparametrically” (p.29). The practical meaning is that to use 
BAM we do not need the assumption about the underlying 6 distribution which can be estimated 
by BAM, except for the number of 6 levels. Keep in mind that item parameter estimates are with 
respect to the sample of examinees. The number of 6 levels affects MML estimation precision, 
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but how well this “nonparametric” estimation of the 6 distribution influences MML estimation 
may need further study. 

Because the Bock- Aitkin procedure is an instance of the EM algorithm, the ^-distribution 
parameters can be estimated along with item parameters, as described in the last section. How- 
ever, there are two constraints needed to identify the two-parameter logistic model family, which 
includes the three-parameter logistic model (Birnbaum, 1968) with the lower asymptote param- 
eter fixed (see Hambleton & Swaminathan, 1985; Tsutakawa, 1984, 1985). One way is to set 
the mean of 0 to be zero and the variance to be one. If 0 is assumed as discrete with a set of q 
predetermined 9 values, then in addition to item parameters, there are q distribution parameters 
(i.e., the probabilities associated with each predetermined 6 level) to be estimated subject to two 
constraints. The q probabilities can be re-estimated at each EM cycle as described in the above 
extended procedure, but the mean and variance should be adjusted to meet the two constraints at 
each cycle or after the final cycle. The new transformed 6 values may not be the same as the pre- 
determined values. The item parameter estimates also need to be adjusted correspondingly. The 
necessity of this re-standardization for identifying item parameters when 9 is discrete has been 
questioned by Lewis (1985, p.205). But if both the probabilities and the locations of the points 
of a discrete representation are estimated jointly, Bock (1997, p.29) regarded the (underlying) 9 
distribution as characterized semiparametrically. On the other hand, if 9, throughout the study, 
is assumed to be fixed as N(0, 1), then it is not necessary to use a discrete representation, but 
Gauss-Hermite quadrature can be used, and the quadrature points and associated weights do not 
need to be changed during the EM iteration, because the 9 distribution is known and fixed. 
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The EM algorithm perspective is more flexible than the Bock- Aitkin procedure perspec- 
tive. If the quadrature-point technique is adopted, then other kinds of numerical quadrature tech- 
niques than Gauss-Hermite quadrature can be considered. Moreover, in addition to the Newton- 
Raphson method, other maximization methods can be considered in the M-step. 

For some IRT models, the maximization (for example, using the Newton-Raphson itera- 
tion) in the M-step may not be attained. When this occurs, a generalized EM algorithm (GEM, 
see Equation 25) can be considered. 

Whereas the EM algorithm produces MLEs, Dempster et al. (1977) also described a mod- 
ification of the EM algorithm for Bayesian modal estimates (for applications of the Bayesian- 
approach EM algorithm to IRT see Harwell & Baker, 1991; Mislevy, 1986; Tsutakawa & Lin, 
1986). These two algorithms have fundamentally different philosophies of statistical inference 
(Lewis, 1985). In the Bayesian approach, item parameters are also treated as random variables 
with prior distributions, that is, as a random sample from a large population (item pool) (Rigdon 
& Tsutakawa, 1987). In IRT, the Bayesian approach is used, as a numerical tool, to prevent 
parameter estimates from becoming indefinitely large (e.g., Mislevy & Bock, 1993) or to con- 
trol the parameter estimates within a desired range. The choice of prior distributions for item 
parameters is often made in an arbitrary manner. The procedure is for Mislevy’s marginalized 
Bayesian estimation (see Harwell & Baker, 1991), not MML estimation. 
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